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The  Problem  of  Unequal  Variances 


Tlie  title "The  Problem  of  Unequal  Variances/'  is  too  broad;  far  too 
little  is  knotm  about  many  facets  of  the  nroblem  to  attemot  to  discuss  it 
generally.  Rather  we  will  consider  here  only  solutions  to  certain  aspects 
of  the  problem  that  have  been  published  either  quite  recently  or  in  such  form 
as  to  gain  popular  acceptance  slowly.  It  is  part  of  our  responsibility  to 
see  that  these  improved  techniques  become  known. 

There  isn't  one  of  us  here  today  who  wasn’t  warned  in  his  first  course 
in  statistics  against  the  so-called  L.S.D.  (or  "least  significant  difference") 
test  for  the  separation  of  observed  sample  means^  and  for  most  of  us  that 
first  course  was  quite  some  years  ago.  It  was  over  l5  years  ago  that  Fisher 
suggested  his  now  crude  technique  for  overcoming  some  of  the  bias  in  that  test. 
However,  at  least  since  Tiikey’ s 19^9  Biometrics  paper  f olla/ed  by  the  work 
of  Duncan,  Sheffe,  Bechhofer,  and  others,  the  beast  should  long  since  have 
succurabed.  Nevertheless,  many  of  us  see  almost  dailj^  evidence  of  the  contin- 
ued use  of  the  "old  favorite"  L.S.D. 

Probably  the  most  frequently  made  decision  in  experimental  science  is 
whether  one  mean  is  greater  than  one  or  more  other  means;  that  is,  whether 
on  the  basis  of  saraple  evidence  certain  population  means  may  be  supposed  with 
any  reasonable  assurance  to  differ.  To  aid  in  maicing  such  a decision,  there 
are  several  procedures  available  more  or  less  sensitive  to  the  quantity  of 
other  available  and  pertinent  information  such  for  example  as  the  knox-Jledge 
of  the  magnitude  of  the  population  variances  or  of  the  equality  of  the  popu- 
lation variances  or  such  as  a set  of  saraple  estimates  of  the  population 
variances  possibly  or  only  feel  for  the  range  of  possible  values.  There  is 
no  one,  I suppose,  who,  when  confronted  with  sample  estimates  of  population 
means,  does  not  temper  his  judgment  as  to  the  comparative  magnitudes  of  popu- 
lation means  by  some  outside  estimate  of  precision  of  these  means.  It  maj'’ 
be  by  intuition,  by  his  faith  in  the  experimenter,  by  his  own  past  erooerience 
or  by  some  evidence  contained  in  the  saraples  themselves.  But  the  precision 
of  the  sample  means  and  therefor  of  the  differences  among  them  is  questioned. 
For  exaraple,  we  are  told  that  samples  from  two  lots  of  wool  yielded  average 
fiber  diameters  of  21.I|.0  microns  and  31.00  microns  respectively'".  Is  it 
reasonable  to  suppose  that  the  mean  fiber  diameters  of  the  tvro  lots  differ? 

We  are  further  told  that  experienced  wool  graders  selected  the  samples;  is 
the  question  any  easier  to  answer?  We  add  to  our  information  the  knowledge 
that  the  same  experienced  graders  on  the  basis  of  these  samples,  graded  the 
two  lots  respectively  6Ii.s  s'bable  and  hO/UOs  staple.  At  tliis  point,  apy  expe- 
rienced grader  would  no  doubt  conclude  that  the  lot  mean  fiber  diameters 
differed,  and  he  would  probably  arrive  at  this  conclusion  not  so  much  on  the 
basis  of  the  measLirements  as  upon  his  knowledge,  from  exnoerience,  of  the 
ability  of  graders  to  detect  fiber  diameter  differences  despite  "the  usual 
variability  among  samples  and  amo’ng  fibers  in  sasn/ples.  Possibly,  even  the 
knowledge  that  both  lots  were  graded  staple  length  assists  him  in  appraising 
the  precision  X'riLth  which  the  graders  are  able  to  separate  lots  of  different 
mean  fiber  diameters  on  the  basis  of  samples  from  those  lots.  It  must  be 
remmbered  that  a fiber,  however  small,  is  after  all  not  uniformly  cylin- 
drical throughout,  and,  just  as  for  their  lurnian  counterparts,  "diameter"  may 
be  considerably  easier  to  agree  upon  for  the  svelte  type  than  for  the  buxom 
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model.  At  least  aii  experienced  grader  is  probably  well  aware  of  the  relation- 
sliip  exhibited  in  Figure  1^  although  he  might  not  reco  nize  the  language  by 
which  V7e  describe  it.  However,  we  do  not  have  the  benefit  of  his  experience, 
and  even  the  additional  information  contained  in  Figure  1 is  still  insufficieit 
to  answer  our  question.  Not  only  that,  the  knoxiledge  from  Figure  1 strikes 
terror  to  our  hearts,  but  only  because  we're  statisticians.  We  all  knexj,  of 
course,  as  soon  as  we  had  the  standard  deviations,  or  estimates  thereof,  that 
the  only  missing  links  were  the  sample  sizes.  But  nox7,  even  the  knowledge 
of  the  sample  sizes  won't  close  the  gap  because  the  usual  tests  are  valid 

only  if  we  can  assume  equal  variances and  this  not  only  on  the  null  but 

also  on  the  non  null  hypothesis.  Incidentallj/",  I mention  this  because  Figure 
1 was  plotted  wholly  from  staple  length  measurements;  for  other  fiber  lengths 
the  standard  deviations  may  be  quite  different.  Thus  even  on  the  null  hypothesis 
of  equal  mean  fiber  diameters  there  is  no  assurance  of  equal  standard  deviations. 
The  samples  consisted  of  600  and  1000  fibers  respectively;  so  the  standard 
errors  of  the  means  are  respectively:  I|.63  , ,7.25  „ , 

vSoo”  = a^^^/looo  = 

example  explodes'.  No  one  could  possibly  doubt  the  significance  of  the  differ- 
ence of  over  10  microns  between  sample  means  XTith  standard  errors  of  such 
magnitude;  neither  should  anyone  talce  so  many  observations  to  establish  the 
significance  of  such  a difference.  The  truth  of  the  matter  is  the  experiment 
from  which  these  data  came  xras  designed  "(a)  to  determine  if  the  size  of  cor- 
ing tube  used  in  drawing  cores  influenced  fineness  results,  (b)  to  develop  a i 
reliable  and  economical  plan  for  sub  sanpling  scoured  core  residues,  and  (c) 
to  determine  the  adequacy  of  cores  drawn  for  clean  ^cLeld  for  estimating  the 
fineness  and  variability  of  a lot  of  graded  grease  wool.  ul/ 

A carefiil  reading 

of  the  manuscript  cited  suggests  that:  We  may  interpret  (a). to  mean:  Is  there 
a significant  difference  between  the  mean  fiber  diameters  for  the  composite 
3/8"  and  the  composite  l-l/N"  coring  methods  lot  by  lot?  (See  Table  l),  (b) 
involves  answering  the  following  question:  Is  there  a significant  difference 
between  plans  of  sub  sampling  I and  II?  It  also  involves  answering  the 
question:  Do  the  mean  fiber  diameters  for  the  "Individual"  and  "Composite" 
methods  of  core  sampling  differ  significantly?  We  may  interpret  (c)  to  mean: 

Do  any  of  the  mean  fiber  diameters  for  the  various  core  sampling  procedures 
differ  significantly  from  the  mean  for  the  same  lot  as  determined  from  the 
card  sliver?  Supplementally,  but  not  alternatively,  the  authors  may  have  been  *! 
interested  in  x^hether  corresponding  differences  among  standard  dexrlations 
and  among  coefficients  of  variation  could  be  shovm  to  be  significant;  possibly  i 
they  may  even  have  been  interested  in  the  significance  of  the  differences  i 

from  lot  to  lot  among  coefficients  of  variation  for  the  same  coring  method. 

So  the  example  does,  after  all,  ejchibit  the  two  usual  aspects  of  the  problem 
of  unequal  variances,  naraely;  are  the  variances  homogeneous  and  if  so,  hox7  do 
xje  test  the  differences  betx^reen  means?  Thus,  for  the  [;8/i|6s,  it  is  desired 
among  other  things  to  knox-j  xjhether  the  individual  3/8"  core  sampling  method 
3''ielded  a mean  fiber  diaraeter  significantly  different  from  that  of  the  card 
sliver  saraple.  In  light  of  our  condemnation  of  the  L.S.D.  test  and  certainly  ! 
from  the  standpoint  of  gaining  the  greatest  sensitivity,  we  should  •view  our 
objectives  collectively.  Having  already  disposed  of  the  question  of  equality 

l/~  Core-sainpling  Grease  Wool  for  Fineness  and  Variabilit5'',  D.  D.  Johnston, 

W.  J.  Manning,  H.  D.  Ray,  W.  A.  Mueller,  and  E.M.  Pohle,  USDA.  In  press. 
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of  means  and  variances  between  different  lots  in  the  negative ^ suppose  mb  con- 
centrate solely  upon  the  It8/U6s  and  ask  first  whether  the  sampling  method 
standard  deviations  differ,  not  just  for  the  individual  3/8"  and  card  sliver 
samples,  but  collectively. 

For  this  purpose  v;e  use  the  familiar  Bartlett  test^ t 


; _ g 2 '> 

M = n In  — i—i-  xin.  In  s? 

■y  n ^ 1 1 

V 

where  In  indicates  logarithm  to  the  base  and 


n = 

In  our  example  n^^  - 1000  all  ij  we  will  assume  n^  = 1000  for  all  i. 


= 1000 
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where  k is  the  number  of  variances  tested. 


3(k-l) 

In  our  examyple:  M'"  = — JIl . = 13.33 

l.OOOU 


2 

Comparing  this  with  the  tabular  probabilities  for  with  k-1  = 5 d.f.  we 
find  that  the  probability  is  only  about  .02  of  exceeding  such  a value  by 
chance  if  the  variances  were  in  fact  equal.  At  least  one  must  reject  the 
hypothesis  of  equal  standard  deviations  at  the  5 percent  level  of  significance 
How  many  of  you  v/oiold  have  predicted  this  having  merely  looked  at  the  data  in 
table  1? 


There  was,  of  course,  on  the  collective  null  hypothesis  of  equal  means 
and  variances  still  another  estimate  of  the  standard  deviation,  namely,  that 

For  a wholly?"  readable  discussion  of  Startle tT' s test  see  Rao,  Advanced 
Statistical  Methods  in  Biometric  Research  pp.  226-230. 
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given  hj  the  sum  of  squares  of  the  means  about  their  own  mean.  Thus 


V ' 2 2 ; 

s2  = ) -i:  (£  irii)  > /k-1 

^ ; k ) 

= .05018, 

p 

or  on  a per  fiber  basis  s^  = 50.l8  and  s = 7.08. 


It  is  now  clear  that  had  we  knowrn  nothing  about  equality  of  means  or 
variances  and  wished  to  test  the  collective  null  hypothesis  X'^e  might  have 
done  so  exactl3''  as  above  except  for  the  inclusion  of  one  more  estimate  of 
variance.  The  nex-j  M is  easily  computed 


M = 6005  In 


)3h6,7h3.8  ) 

6oo5  [ 


= 2U356.28  - 2ii3U2.77  = 


- 2U3ii2.77 
13.51 


and 


m'  = 13.51 

iToiiir 


13.36 


As  xras  to  be  expected,  M has  changed  little  because,  even  though  the  nexf  s 
is  less  than  all  the  other  6,  it  carries  only  5 d.f . compared  xiith  1000  for 
each  of  the  other  6.  The  new  M exceeds  the  5 percent  tabular  at  6 d.f.; 

SO  xje  conclude  at  this  level  of  significance  that  the  collective  nxiLl 
hypothesis  is  false.  That  is,  either  the  means  are  unequal,  the  standard  de- 
viations (variances)  are  unequal  or  both.  This  test  did  not  tell  us  why  the 
collective  hypothesis  is  false,  only  that  it  is  false.  It  is  easy  to  see  in 
this  instance  that  the  significance  of  M is  due  to  lack  of  homogeneity  of 
variances  rather  than  to  inequalitjr  of  means ; the  estimate  of  variance  de- 
rived from  the  among-methods  mean  square  x-jas  less  than  all  other  x^ithin  methods 
estimates.  It  is  also  easy  to  see,  however,  that  the  above  test  would  be 
relatively  insensitive  to  differences  between  means  because  of  the  few  degrees 
of  freedom  for  the  mean  square  among  method  means. 


Under  the  collective  null  hypothesis,  but  considering  non- null  hypotheses 
that  permit  only  unequal  means,  not  unequal  variances,  X'ire  xjould  have  used  the 
familiar  anal3rsis  of  variance  F test  as  our  criterion.  We  have  shown  by  our 
first  application  of  Bartlett's  test  that  the  assumptions  required  for  the 
analysis  of  variance  test  are  probably  invalid.  Vie  could,  of  course,  use  the 
analysis  of  variance  test  and  place  our  trust  in  those  few  eminent  statisti- 
cians who  have  dared  assert  that  such  a test  is  probably  not  too  badly  biased. 
But,  frankly,  I fear  their  audacity  and  their  eminence  would  be  our  only 
refxige,  for,  the  more  I probe,  the  less  factually  corroborative  exrLdence  I | 

find  to  support  their  opinion  in  general.  G.  E.  P,  Box,  in  the  Jxme  19^h  \ 
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Annals  of  Mathematical  Statistics  sui.imarizes  some  of  the  recent  constructive 
contributions  as  follows: 

"The  problem  of  the  effect  of  unequal  group  variances  was  considered 
in  ohe  case  of  the  t test  by  Welch  (5).  He  obtained  approximate  probabilities 
from  which  it  apoeared  that  the  effect  was  small  when  the  groups  were  of  equal 
size^  but  larger  when  they  were  different  in  size.  Later  some  exact  probabil- 
ities for  this  case  were  found  by  Hsu  (9)  and  another  investigation  by  a 
different  aooroximate  method  was  made  by  Grunow  (10) • Both  of  these  investi- 
gations confirmed  Welch' s results.  Quensel  (11)  considered  the  one-way 
analysis  of  variance  classification  more  generally  and  obtained  an  anproximate 
exoression  for  the  variance  of  the  F criterion  when  the  group  variances 
differed.  He  concluded  that  the  test  would  not  be  greatly  affected  if  the 
group  sizes  were  equal. 

David  and  Johnson  (12),  (13) ^ (lit)  have  discussed  the  general  problem 
of  the  power  function  of  analysis  of  variance  criteria  X'j’hen  the  obseinrations 
are  distributed  independently  but  do  not  necessarily  follow  the  normal  dis- 
tribution or  have  constant  variance.  As  a special  case,  they  consider  the 
one-way  classification  in  which  the  observations  are  normally  distributed  but 
the  variances  differ  from  group  to  group.  Their  method  is  different  from  that 
given  here  and  is  an  approximate  one.  At  the  time  of  writing,  they  have  pub- 
lished fex^  numerical  results  and  tliese  (lU)  are  confined  to  the  case  in  which 
the  sizes  of  the  groups  are  all  equal.  Confirming  the  results  of  Quensel, 
only  slight  changes  in  probability,  from  those  expected  if  the  assumptions 
vjere  true,  have  been  found." 

On  the  basis  of  these  works  and  his  own  research,  he  concludes  as 
follows,  concerning  the  effect  of  group-to-group  inequality  of  variance  in  a 
one  way  classification; 

"It  appears  that  if  groups  are  equal,  moderate  inequality  of  variance 

does  not  seriously  affect  the  test.  However,  with  unequal  groups, 

much  larger  discrepancies  appear." 

To  be  sure  in  our  immediate  example,  all  means  are  based  upon  equal 
numbers  of  fiber  measurements,  but  this  is  far  from  true  in  the  case  of  the 
lot  of  6I|.s.  You  will,  no  doubt,  call  my  attention  to  the  sizable  number  of 
degrees  of  freedom  available  for  estimating  the  standard  deviations.  IJhy  not 

('(m^-S)^h 

assume  that  Z.  ^ — 6 v is  distributed  as  x^ith  k-1  = 5 d.f.j  where,  of 

I C.  / i 

_ Lm  P 

course,  m is  not  “ i but  rather  — (jn.  /(s  /n  )’  with  this  question,  my 

— n 1 i'  i ■ 

example  explodes  completely'. 

The  full  truth  of  the  matter  is  that  I deliberately  selected  an  example 
wherein  I could  dodge  the  issue  for  the  simple  reason  that  this  is  our  usual 
tactic.  But  no  one  here  is  deluded  by  such  trickery.  We  can  in  our  real 
every  day  life  rarely  avoid  the  xmeqxial  subclass  nxombers  problem  or  the 
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problem  of  imeqiial  variances  either  by  the  usual  tricks  you  find  in  the  text 
books  or  by  excloding  the  sample  sizes.  Until  quite  recently  we  have  had  no 
satisfactory''  techniques  for  handling  the  comparison  of  several  means  with  un- 
known and  possibly  unequal  variances.  There  is^  however^  no  excuse  whatever 
for  our  not  making  use  of  Welch’s  results  for  the  gomnarison  of  two  means. 

This  is  none  other  than  the  Behrens-Fisher  problem  (famous  or  infamous,  as 
your  syanna thief  lie)  over  which  so  much  controversy  has  been  xiraged,  Behrens' 
work  appeared  in  1929',  Fisher's  fiducial  inference  argument  dating  from  1930 
led  to  corroboration  of  Behrens'  work  in  19li-l.  You  may  not  accept  the  fiducial 
test  as  satisfactory'-;  there  are  many  quite  able  statisticians  who  do  not. 
Kendall  leaves  his  reader  with  "a  choice  of  several  attitudes  to^^rard  the 
foundations  of  the  fiducial  argument:  (a)  he  can  accept  the  argument  as  in- 
volving a nevr  postulate  of  inference;  he  can  regard  it  as  sanctioned  by" 
essentially  Fisher's  approach;  "or  (c)  he  can,  so  far  as  estimates  based  on 
a single  parameter  are  concerned,  console  himself  with  the  thought  that  re- 
sults of  the  process  are  the  same  as  those  given  by''  the  theory  of  confidence 
intervals . " He  points  out  that  an  'V  -level  Behrens-Fisher  fiducial  test  of 
the  difference  between  two  means  does  not  insure  that  the  user  will  be  correct 
in  (l--*)  percent  of  cases;  rather  he  recommends  Welch's  procedure  to  insure 
this  kind  of  protection,  Welch  observes  that  . . . "although  Fisher's  ap- 
proach has  been  very  much  criticised  by/-  a number  of  writers,  starting  with 
Bartlett ( 1936) , the  critics  have  not  wished  to  throw  doubt  on  the  whole  body 
of  results  that  Fisher  includes  under  the  heading  of  fiducial  inference." 

See  Welch  19U7,  p.SU.  Welch's  work  dates  from  1938  and  the  results  were 
tabled  by''  Aspin  in  19U9.  It  is  unfortunate  that  tliese  tables  contain  the 
critical  values  for  only  the  1 percent  and  5 percent  one  tailed  test  corres- 
ponding to  the  2 percent  and  10  percent  values  for  the  two  tailed  test,  but 
in  my  opinion  these  tables  should  be  more  widely  disseminated  and  used.  The 
procedure  is  as  easily  annlied  as  the  more  usual  t-test  and  in  most  cases 
much  more  appropriate.  I want  to  illustrate  its  use  vrith  a couple  of  examples 
and  recoi  mend  that  you  use  it.  This  was,  principally,  what  I had  in  mind 
vrhen  I suggested  the  title  of  this  paper  as  a topic  for  discussion.  It  is 
not  appropriate  here  to  draw  further  comparisons  between  the  Behrens-Fisher 
and  Aspin-Welch  procedures;  however,  I hope  to  include  as  an  a-ppendix  an 
elementary  treatment  of  such  a comparison.  I do  want  to  include  in  the  body 
of  the  paper,  however,  what  we  knoX'j  about  the  more  general  case  of  more  than 
two  means. 

A problem  that  has  arisen  on  several  occasions  in  myr  own  office  and 
that  embodies  almost  all  of  the  difficulties  associated  with  the  problem  of 
•unequal  variances  is  the  following.  Cooperative  fluid  milk  handlers  pay  their 
patrons  monthly  in  proportion  to  the  amo-unt  of  butterfat  delivered.  The 
amount  of  butterfat  is  computed  on  the  combined  basis  of  total  pounds  of  milk 
delivered  and  a Babcock  test  of  the  percent  butterfat.  In  an  effort  to  reduce 
the  amount  of  testing,  the  procedure  has  been  fairly  frequently  adopted  of 
blending  daily^  samples  into  composite  samples,  adding  an  inhibitor  and  stor- 
ing under  refrigeration  for  7,  10  or  even  15  days  before  making  the  Babcock 
test.  It  has  been  fairly'-  well  established  that  through  storage  the  composites 
generally  lead  to  estimates  of  monthly  butterfat  deliveries  that  are  biased 
downward.  The  amount  of  the  bias  is  usually  slight  but  increases  in  general 
with  increased  storage  time,  varies  somewhat  directly  with  the  percent 
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butterfat,  varies  from  producer  to  producer  and  from  season  to  season  (with 
temperature)  for  the  sajne  producer.  Add  to  this  the  fact  that  the  percent 
butterfat  itself  varies  more  or  less  periodically  pretty  much  in  phase  with 
the  seasons. 

The  kind  of  data  one  is  usually  confronted  with  is  similar  to  that  in 
Table  k expanded  somewhat  to  include  more  months  and  more  producers  but  rarely 
expanded  in  the  direction  of  more  measurements  per  day.  It  is  quite  natural 
to  expect  that  the  variances  of  monthly  mears  for  the  four  sampling  methods 
are  far  from  equal.  We  will  see  in  a moment  that  in  our  example  the  variance 
among  daily  tests  v^ithin  any  month  is  quite  different  from  that  among  7-day 
composite  tests.  Keeping  in  mind  our  comments  on  the  L.S.D,  test  and  remind- 
ing you  again  of  the  changes  from  producer  to  producer  and  from  month  to  monta 
in  both  mean  and  variances,  I would  like  to  throw  out  for  your  suggestions 
the  question  of  how  best  to  analyze  such  data  so  as  to  ascertain  whether  the 
compositing  procedure  does  in  fact  lead  to  biased  results  and  if  so  whether 
the  amount  of  bias  is  related  to  the  length  of  the  compositing  period.  I am 
quite  serious  in  soliciting  your  remarks  as  to  proper  techniques  for  handling 
such  data. 

For  our  present  ourposes,  we  a^ill  be  interested  less  in  the  practical 
question  of  bias  and  how  to  test  for  it  than  in  a simple  application  of  the 
test  for  the  difference  between  two  means  when  the  variances  cannot  be  assumed 
to  be  equal.  Table  U shows  the  resiilts  of  butterfat  tests  for  one  producer 
for  the  months  of  August  and  Seotember  of  19^1  and  for  June  of  19^2.  Let  us 
first  test  whether  the  difference  between  tlie  mean  for  the  two-month  Fall 
19^1  season  and  the  mean  for  June  of  the  following  year  is  significantly 
positive.  The  relevant  statistics  are: 

= U.I836  m2  = 3.8700 

si  = .2I4O73  s|  = .021ii8 

n^  =60  n2  = 29 

where  the  m's  represent  the  sample  means,  the  s 's  represent  the  sample  esti- 
mates of  variance  and  the  n' s are  the  respective  degrees  of  freedom. 


If  we  could  assume  equal  variances  we  would  compute 


t 


= 3.U2  and  ccmpare  Xirith  tabular  values 


in  a t- table  at  n^  + n2  = 89  d.f.  or  x^ith  tabular  values  in  a table  of  areas 
under  the  Normal  curve.  Alternatively,  one  might  compute 


t' 


m^  - m^ 


n^  + 1 


x/ 


n2  + l 


= 
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There  is  some-  question  about  what  d.f.  to  use  in  testing  Some  people  use 

Hn  + ^25  others  use  the  minimum  of  ^ ani^2j  still  others  always  refer  t'’  to 
Normal  tables  perhaps  even  for  small  sample  sizes.  V/elch  has  proposed  as  the 
effective  d.f.^  f^  for  V : 1 _ + 1-c^ 

f ~ n]_  n2  where  o c 1 so  that  f always 

lies  between  the  smaller  of  the  and  their  sum.  0 will  be  defined:  in  a 
moment.  See  Welch  (I9I4.9)  P*  296. 


It  is  quite  x^rell  known  that  neither  t nor  t^  is  distributed  as  Students' 
t if  a]_  = Op;  if  = Opj  t is  di.stributed  as  Students'  t while  t'  is  not 
unless  in  addition  n-j^  = np  in  which  case  t = t' . It  may  be  because  of  the 
equality  in  the  last  case  that  the  degrees  of  freedom  ascribed  to  are 
sometimes  taken  to  be  npj  clearly  tir^e  use  of  the  minimum  of  and  no  is 

an  attempt  to  provide  a conservative  test.  If  it  is  known  that  = Cp;,  then 
t is  surely  the  statistic  to  use.  If,  hovrever  a]_  and  03  differ^  then 
t may  give  very  misleading  resxiLts  and  it  Xirill  be  safer  to  use  t^.  Figure  2 
is  a reoroduction  of  Welch's  Figure  1 1/  from  the  reference  cited  below  and 
shows  that  the  unknown  ratio  of  variances  & need  not  differ  much  from  unipy 
before  -v  (our  t' ) becomes  less  biased  than  u (our  t)  at  least  for  the  case 
illustrated  of  riq  = U and  ^p  = i£.  Welch's  test^  which  is  to  be  recommendedj 
enroloys  the  statistic  t'  but  requires  that  the  significance  of  the  statistic 
be  judged  by  comoarison  with  the  Asnin-Welch  tables  as  provided  in  Appendix 
tables  2 and, 3.  The  only  calculation  required  to  enter  the  table  is 


2 '^2  2 
S2/np+l/(s;j^/^  + S2/np+l)  = .1^. 

Again  you  will  hasten  to  point  out  to  me  that  both  t and  t are  well 
beyond  any  of  the  tabular  values  for  either  the  5 percent  or  1 percent  Student 
t or  Aspin-Welch  t'^  respectively.  Both  exceed  the  one  tailed  1 percent  N.ormal 
deviate.  However^  I XTOuld  like  to  counter  with  the  reminder  that  t is  just 
under  3-1/2  while  t''  just  exceeds  i|-l/2|  the  area  under  the  Normal  curve 
be^rond  the  standard  Normal  deviates  3-l/2  and  £-1/2^,  though  both  quite  small 
are  relatively  quite  different  in  magnitude. 


As  a second  example^  let's  perform  the  same  test  for  the  7-day 
composites.  Here  the  relevant  statistics  are 


m^  = U.I75 
s^  = .076U3 


nip  = 3.8^0 

= .00333 

np  = 3 


from  T^rhich  we  find  that  t - 2.28  and  t'  = 3.19.  The  first  of  these ^ when 
compared  with  the  one  tailed  tabular  Student  t with  10  d.f.,  falls  far  short 
of  significance  at  the  1 percent  level;  in  fact  the  probability  of  a Student  t 
as  large  or  larger  than  2.28  exceeds  .02.  On  the  other  hand  t'  = 3.19  at 


Welch,  B.  L.  Biometrika  1938.  p.  3^^ 
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10  d.f,  is  significant  at  the  1 percent  level j the  probability  of  a Student  t 
as  large  or  larger  than 3.19  is  less  than  .005.  As  in  the  nrevious  example ^ 
there  is  little  need  for  computing  the  ratio 

O 

Sp/n^+1 

__  ^ p 

s^/nf+l  + S2/n2+l 


because  t"'  exceeds  every  entrjr  in  the  Aspin-VJelch  table  of  one-tailed  1 per- 
'-.cent  level  test  tabular  values. 

In  passing^  xje  might  note  that  examination  of  the  differences  betvreen 
.daily  mean  tests  and  the  corresponding  7~day  composite  tests  does  not,  in 
this  particular  instance,  lead  one^to  suspect  unequal  variances;  furthermore, 
the  amount  of  bias,  though  positive,  is  not  sufficiently  large  to  establish 
its  significance  on  so  few  degrees  of  freedom. 


In  order  to  illustrate  further  the  use  of  tlie  Asnin-Welch  tables  sup- 
pose we  test  for  the  difference  between  means  in  the  following  example; 


m^  = 60.3 

=1 

= 51 

ON 

II 

m2  = U7.I 

4 

= 3li0 

n2  = 6 

computing 


and  t'  = 


s^/ni+1  + S2/n2+l 

'^1  - ^2  ^ 13.2 

\/ 


Entering  Table  3 with 


.1  si 


A ^2  4.  \ C2 


1®1 


2®2 


5.1 


= .095 


= 1.80 


7.326 

= .095  = 


.1,  f2  = 6,  f^ 


= 9 we  find 


that  t'  = -v  needs  to  be  I.90  to  be  significant  at  the  5 percent  level.  Tlie 
one  tailed  Student  t required  for  l5  d.f.  and  at  the  5 percent  level  of  sig- 
nificance is  only  1.753.  Here  then,  interpreting  t'  as  a.  Student  t leads 
falsely  to  the  conclusion  that  the  means  are  significant  at  the  5 percent 
level.  If  instead  of  t'  we  had  comouted 


t . y3.2_  , 2.08  , 

^ 6.361 

/Vi  * "2p  * _U- 

again  we  woifLd  have  6^®^  Isd  to  the  erroneous  conclusion  that  the  means  differ 
significantly.  It  might  be  worth  noting  in  passing  that  VJelch's  proposed  ef- 
fective degrees  of  freedom  are  f = 6 for  which  the  5 percent  one  tailed, critical 
value  is  1.9U,  which  agrees  fairly  closely  with  the  value  1,90  from  Table  3. 


) 


The  use  of  the  Aspin-Welch  tables  for  estimating  confidence  intervals  is  pre- 
cisely the  same  ais  the  use  of  the  Student-t  tables  for  the  same  purpose. 
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Thus,  in  the  exarnole  just  comnleted,  vie  would  have:  (l)  in  keening  with  our 
use  of  the  one  tailed,  a lower  confidence  limit  of ^ zero  for  the  difference  be- 
tween the  two  means  and  (2)  because 


P CO  f t'  < 1.90  i-  = .95 


i.e.  P U oo<  ^1  " ^2  ^ 1.90 

~T32S“ 


then  the  upner  95  percent  confidence  limit  for  the  difference  is  7.326  x 
1.90  = 13.9^. 

In  closing  the  discussion  of  the  test  of  the  significance  of  two  means,  I 
should  point  out  to  those  interested  in  the  mathematical  aspects  of  the  problem 
the  elegant  treatment  by  Hsu  in  the  reference  cited. 

A few  highlights  from  Hsu’s  paper  are; 


2/2  2 

Defining  6 = - JJ-23  3 ~ ^1  ^ 


where  N2=n2+1 


X = ^ 

2a2 


u = (mj_  - m2) ^ / (A]I:i+A2E2) 


Hsu  gets  the  distribution  of  u in  general. 


Special  cases  are 

(1)  = our  t‘^  comes  from  taking 

A^  = A^  = N /(N-2)  N]_N2  N = N^+N2 


®Vb2  = o 


(2)  U2  = our  t'^  ^ comes  from  taking 


A.  = VNj  (Hj-1) 


In  particular  he  shows; 


(1)  For  any  N^,  N2  and  fixed  G the  power  of  u,  is  increasing 


with  X. 


(2)  B{o  ,0)  - ^iO)  increases  with  G)  for  G >1* 

(3)  N2)  =;8(X/;r^,  N2,  N^) 

(h)  For  X = 0 Si'ksO)  =j8i(s)  = Size  of  the  test^  in  contrast  to  the 
power,  (a)  For  u = U2  = oiu*  t^^  and  for  N]_  N2,  82 (^'G»)  talces  the 

form 


but  the  infinite  branch  of  the  curve,  as  log  00,  may  not  rise  as 

N3_(N3_-1)  ^ --n 

-—7- — rr  . If  Nn  > Wo  the  curve  is  simply  reflected 

N2(W2-1)  ^ ^ ...  ...  .... 

in  the  vertical  axis. 

■A  : f . 

(b)  The  form  of  is  similar  to  that  of  except  that  for  N2-N2?  2 

the  curve  declines  as  In  (9 ^ 00  . 

(c)  For  = N2,  u^  *=  u^  or  t = t'l  so  = .82('-^^*  point  M moves 

to  zero,  and,  of  course,  the  curve  is  .syiranetrical. 

1. 

His  conclusions  are  as  follows:  w 

(A)  If  the  hynothesis  to  be  tested  is  X = 0,  C>  = I3  (not  the 

hypothesis  x^e  have  considered  in  this  paper,  namely  X = O) 
sampling  from  certain  populations  U]_,  or  t,  will  reject  less 
frequently  x^hen  it  is  false  than  when  it  is  true,  which  is 
clearly  a most  xonsatisfactorj''  result.  U2,  or  t%  is  less 
seriously  biased  and  over  a considerably  more  restricted  domain 


high  as  ^2(1)* 
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of  points  (\^  ■ ) but  is  likely  to  be  less  sensitive  to  variations 
of  C . For  = N2  (i.e.  = U2)  the  common  test  is  unbiased  but 

quite  insensitive  to  variations  in  (S' . Hsu  would  rule  out  both 
u-]_  and  U2  as  inadequate  for  testing 

(B)  If  the  hypotliesis  to  be  tested  is  H2;  X = 0 (the  test  discussed 
in  this  naper)  both  and  u^  have  a minimum  at  X = 0 

(i.e.^  unbiased)  but  do  not  have  distributions  tliat  are  inde- 
pendent of Q;  so  it  is  impossible  to  control  the  type  I error. 
"The  test  U2  has  the  advantage  over  u^  because  it  is  likely  to 
be  more  insensitive  to  the  variation  of  CO.  ...  If  =»  N2  = n 

and  if  n is  not  very  small,  then  u^(=U2)  is  so  indifferent  to  the 
variation  of  CO  that  we  may  safely  use  it  to  test  H2  just  as  if 
were  known  to  be  unity.  In  fact,  if,  say,  n = 10,  then,  in  using 
the  u^^-test  for  and  taking  the  significance  level  as  the  5 
percent  level  cf  t , we  can  say  that  (i)  the  test  is  unbiased, 

(ii)  whenever  H2  is  true,  the  chance  of  rejecting  it  is  between 
0.05  and  0.065,  and  (iii)  if  p)  = 1,  then  there  is  no  other  un- 
biased test,  adjusted  to  give  the  same  chance  0.05  of  the  first 
kind  error,  which  is  more  able  to  detect  a departure  of  X from  0." 

The  problem  of  testing  the  equality  of  several  means  is,  of  course, 
much  more  difficult  than  that  of  testing  the  equality  of  two  means.  If  the 
nroblem  of  tx-ro  means  has,  itself,  proved  somewhat  intractable  and  satisfac- 
tory methods  for  handling  it  only  relatively  recently  made  available,  it  is 
natural  to  expect  that  very  little  is  knoxixn  about  hox^  to  handle  the  more 
difficult  problem.  There  are,  in  fact,  only  a fex-r  papers  that  provide  the 
applied  statistician  any  constructive  suggestions. 

We  noted  a little  earlier  that  the  quantity  ;;;;  — o (m.-m)  is  dis- 

i=la^  ^ 

tributed  as  )(^  with  k-1  d.f . if  the  k unixnoX'm  population  means  are  all  equal 

2 

vjithout  any  assxrnintion  about  equality  of  the  a^.  This  provides  the  basis  for 
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a convenient  large  saiTiple  an-oroximate  test  by  reolacing  the  a.  by  their 
sample  estimates  s?.  (in  the  coefficients  and  in  m)  and  still^using  the 
tables  to  judge  the  s ignificance|  the  resulting  statistic  affords  a further 
challenge  to  obtain  its  small  saranle  distribution.  This  is.  not  at  all  easy 
to  obtain.  G,  S.  James , in  a logical  attenrot  to  studentize  the  problem^ 


renlaces  the  probability  P 


r- 

/ 

/Z 


sponding  probability 


/ 


P / 
/ 


ni  _ 7 

— ^(m-  - m)^  > )/  = by  the  corre- 

L ^i  " J 

2 

f(sf/^-)  / and  obtains  for  the 
/ 


n 


1 


^ O 

required  function  f a power  series  in  s.  Talking  onl3"  the  first  correc- 
tive terms  (terms  through  those  of  order  -1  in  the  n^)  he  recommends^  as  an 
imorovement  over  the  anproximation  test,  the  use  of 


[\ 


3>y?  + (k-H) 

2(k^  - 1) 


1 


n- 


(1  - ) 
w 


n. 

as  the  critical  value  of  the  statistic  (m.  - m) , where  Wj^  = — ^ and 

-L  1 c 

n. 

w = ^ “2“’  also  gives  the  term  of  order  -2  in  the  nj_  but  notes  because 

of  complexity  that  it  will  likel3r  prove  of  not  too  much  practical  utility. ) 


Welch ^ as  in  liLs  earlier  work  on  the  problem  of  two  means ^ approxi- 
mates the  distribution  of  James'  statistic  by  comparing  its  moment  generating 
function  x^rith  that  of  the  familiar  F distribution.  He  defines  a nex-j 
statistic 


V 


.2  _ 


/ k - 1 


hJi-l)  ..  1 
kll  -S, 


(1  - 


“i  )2  7 

vr-  f 


the  significance 
of  freedom  k - 1 


of  XiTliich  is  to  be  judged  b3''  entering  the  F table  with  degrees 

and  /"  3 - w. 


L vhi 


li 


n (1  - - 


i ^2  7-1  shoX'js  that  to  order 


-1  in  the  n^  this  procedure  is  equivalent  to  James'  procedure.  "To  higher 
orders^  of  course,  the  two  procedures  differ.  There  are  obvious  points  which 
coxild  be  mentioned  in  fa.vor  of  either  method,  but  no  extensive  numerical  work 
has  been  done  to  compare  them."  Ixaraple  1 appended  is  an  excerpt  from 
Welch's  paper. 


Related  to  the  problem  of  hox^  to  test  for  equality  among  several  means 
when  variances  not  be  equal  is  the  question  of  how  the  inequality  of 
variances  affects  the  analysis  of  variance  test  of  equality  of  means.  The 
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Daoers  by  Box,  Quensel,  and  David  and  Johnson  cited  in  the  references  are 
concerned  more  with  this  asnect  of  the  nroblem  than  with  how  nroperly  to 
test  for  equality. 

An  interesting  exai.mle  of  the  Behrens-Fisher  ■'•roblem  extended  to 
multivariate  dimensions  was  considered  recently  by  Fraser  ,20/  in  comparing  q 
of  p + q regression  coefficients  in  one  norraal  ;X)pulation  with  the  correspond- 
ing q coefficients  in  another  population  xirhere  the  respective  residual 
variances  could  not  be  assumed  to  be  equal. 

VJe  have  spoken  here  today  only  of  testing  collectively  for  equality 
of  means.  Once  the  question  of  equality  has  been  decided  in  the  negative 
there  iriimediately  arises  the  question  of  how  to  separate  the  disparate  means. 
In  case  variances  cannot  be  assumed  to  be  equal,  we  know  little  or  nothing 
about  the  solution  to  this  problem j if  variances  are  equal  there  are  a choice 
of  recently  developed  methods  to  apply,  A discussion  of  the  relative  merits 
of  these  methods  is  to  be  the  subject  of  one  of  our  seminars  in  the  very  near 
future. 
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Figure  1.  Comparison  of  Mean  and  Standard  Deviation 
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Figure  2.  Comparison  of  Usual  Tests 
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Figure  3.  Comparison  of  Usual  Tests 
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Table  1.  Average  fiber  diameter,  standard  deviation,  and  coefficient 
of  variation  for  scoured  core  and  card  sliver  samples. 


; dumber:  Number  : : Plan  : Number  : ; Stand-:  Coeffi- 

drade  : of  :of  cores*  Procedure  : of  ; of  : Average:  ard  ‘ cient 

of  : bales:  drawn  : for  core  : sub-  : fibers  : fiber  : devia-:of  varia 

lot  ; cored; per  bale:  sampling  ; sampling : measured : diameter ; tion  ; tion 


SUs 

staple 

10 

10 

Composite  3/8" 

10 

5 

Individual  3/8" 

10 

10 

Comoosite  l~l/U" 

Card 

sliver 

^6/58s 

10 

8 

Composite  3/8" 

staple 

10 

5 

Individual  3/8" 

10 

8 

Composite  1-l/U" 

Card 

sliver 

U8/I4.6S 

2 

8 

Composite  3/8" 

staple 

2 

5 

Individual  3/8" 

2 

8 

Composite  I-I/I4." 

Card  sliver 


Microns  Microns  Percent 


I 

600 

21.78 

U.Co 

22.  OU 

II 

600 

21.  UO 

li..63 

21.61; 

5,000 

21.76 

5.05 

23.21 

I 

600 

21.70 

U.85 

22.35 

II 

600 

21.70 

I4.71 

21.70 

600 

21.90 

U.63 

21.11; 

I 

1,000 

26.35 

6.53 

21;.  78 

II 

1,000 

26.58 

6.80 

25.58 

5,000 

26.61 

6.70 

25.18 

I 

1,000 

26.98 

5.80 

21.50 

II 

1,000 

26.50 

6.U8 

2i;.l;5 

1,000 

26.60 

6.33 

23.80 

I 

1,000 

31.80 

7.25 

22.80 

II 

1,000 

32.38 

7.63 

23.56 

1,000 

32.iiO 

7.55 

23.30 

I 

1,000 

32.30 

8.05 

21^.92 

II 

1,000 

32.25 

7.38 

22.88 

1,000 

32.12 

7.71 

21;.  00 
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Table  2.  Value  of  v 


_(y_  - 

■\/(X^s^+\2S2  ) 


exceeded  with  probability  = 0.01-'5- 


[or  of  / V / exceeded  T-rith  probability  2 ^ = 0.02] 


N 0.0 

* 

; 0.1 

: 0.2 

: 0.3 

: O.lx 

! 0.5 

: 0.6 

! 0.7 

! 0.8 

: 0.9 

: 1.0 

0 

rX 

II 

C\ 

tH 

f-,=  10 

2.76 

2.70 

2.63 

2.56 

2.51 

2.50 

2.51 

2.56 

2.63 

2.70 

2.76 

12 

2.76 

2.70 

2.63 

2.56 

2.51 

2,1x9 

2.1x9 

2.52 

2.57 

2.62 

2.68 

15 

2.76 

2.70 

2.63 

2.56 

2.51 

2.1x8 

2.1x7 

2.1x8 

2.52 

2.56 

2.60 

20 

2.76 

2.70 

2.63 

2.56 

2.51 

2.1x7 

2.1x5 

2.1x5 

2.1x7 

2.1x9 

2.53 

30 

2.76 

2.70 

2.63 

2.56 

2.50 

2.1x6 

2.1x3 

2.1x2 

2.1x2 

2.1xix 

2.1x6 

CX) 

2.76 

2.70 

2.63 

2.56 

2.50 

2.1xlx 

2.1x0 

2.36 

2.3lx 

2.33 

2.33 

f2=12 

fi=  10 

. 2.68 

2.62 

2.57 

2.52 

2.1x9 

2.1x9 

2.51 

2.56 

2.63 

2.70 

2.76 

12 

2.68 

2.62 

2.57 

2.52 

2.1x8 

2.1x7 

2.1x8 

2.52 

2.57 

2.62 

2.68 

15 

2.68 

2.62 

2.57 

2.52 

2.U8 

2.1x6 

2.1x6 

2.1x8 

2.52 

2.56 

2.60 

20 

2.68 

2.62 

2.57 

2.52 

2.1x8 

2.1x5 

2.1xlx 

2.1x5 

2.1x7 

2.1x9 

2.53 

30 

2.68 

2.62 

2.57 

2.52 

2.1x7 

2.1xlx 

2.1x2 

2.1x1 

2.1x2 

2.1xlx 

2.1x6 

00 

2.68 

2.62 

2.57 

2.51 

2.1x6 

2.1x2 

2.38 

2,36 

2.3lx 

2.33 

2.33 

f2=i5 

f.=  10^ 

2.60 

2.56 

2.52 

2.1x8 

2.1x7 

2.1x8 

2.51 

2.56 

2.63 

2.70 

2.76 

12 

2.60 

2.56 

2.52 

2.1x8 

2.1x6 

2 .1x6 

2.1x8 

2.52 

2.57 

2.62 

2.68 

15 

2.60 

2.56 

2.51 

2.1x8 

2.1x5 

2.1x5 

2.1x5 

2.1x8 

2.51 

2.56 

2.60 

20 

2.60 

2.56 

2.51 

2.1x8 

2.1x5 

2.1x3 

2.1x3 

2.1xlx 

2 .1x6 

2.1x9 

2.53 

30 

2.60 

2.56 

2.51 

2.1x7 

2.1xlx 

2.1x2 

2.1x1 

2.1x1 

2.1x2 

2.1xlx 

2.1x6 

00 

2.60 

2.56 

2.51 

2.1x7 

2.1x3 

2.1x0 

2.37 

2.35 

2.3lx 

2.33 

2.33 

f2=20 

fl=  10 

2,53 

2.1[9 

2.^7 

2.1x5 

2.1x5 

2.1x7 

2.51 

2.56 

2.63 

2.70 

2.76 

12 

2.53 

2.1x9 

2.1x7 

2.1x5 

2.1,'lx 

2.1x5 

2.1x8 

2.52 

2.57 

2.62 

2.68 

15 

2.53 

2.1x9 

2.1x6 

2.1xlx 

2.1x3 

2.1x3 

2.1x5 

2.1x8 

2,51 

2.56 

2.60 

20 

2.53 

2.1x9 

2.1x6 

2.1xlx 

2.1x2 

2.1x2 

2.1x2 

2.1xlx 

2.1x6 

2.1x9 

2.53 

30 

2.53 

2.1x9 

2.1x6 

2.1xlx 

2.1x2 

2.1x0 

2.1x0 

2.1x0 

2.1x2 

2.1x3 

2.1x6 

00 

2.53 

2.U9 

2.1x6 

2.1x3 

2.1x0 

2.38 

2.36 

2.3lx 

2.33 

2.33 

2.33 

f2=30 

£.=  10 

2.Ij.6 

2.1xlx 

2.1x2 

2.1x2 

2.1x3 

2.1x6 

2.5o 

2.56 

2.63 

2.70 

2.76 

12 

2.1x6 

2.ixlx 

2.1x2 

2.1x1 

2.1x2 

2.1xlx 

2.1x7 

2.52 

2.57 

2.62 

2.68 

15 

2.1x6 

2.1xix 

2.1x2 

2.1x1 

2.1x1 

2.1x2 

2.1xlx 

2.1x7 

2,51 

2.56 

2.60 

20 

2.1x6 

2.1x3 

2.1x2 

2.1x0 

2.1x0 

2.1x0 

2.1x2 

2.1xlx 

2.U6 

2.1x9 

2.53 

30 

2.1x6 

2.1x3 

2.1x2 

2.1x0 

2.39 

2.39 

2.39 

2.1x0 

2.1x2 

2.1x3 

2.1x6 

00 

2.1x6 

2.1x3 

2.1x1 

2.39 

2.37 

2.36 

2.35 

2.3lx 

2.33 

2.33 

2.33 

f2=oo 

fT=  10 

2.33 

2.33 

2.3lx 

2.36 

2.1x0 

2.1xlx 

2.50 

2.56 

2.63 

2.70 

2.76 

" 12 

2.33 

2.33 

2.3lx 

2.36 

2.38 

2.1x2 

2.1x6 

2.51 

2.57 

2.62 

2.68 

15 

2.33 

2.33 

2.3lx 

2.35 

2.37 

2,1x0 

2.1x3 

2.1x7 

2.51 

2.56 

2.60 

20 

2.33 

2.33 

2.33 

2.3lx 

2.36 

2.38 

2.1x0 

2.1x3 

2.1x6 

2.1x9 

2.53 

30 

2.33 

2.33 

2.33 

2.3lx 

2.35 

2.36 

2.37 

2.39 

2.1x1 

2.1x3 

2.1x6 

00 

2.33 

2.33 

2.33 

2.33 

2.33 

2.33 

2.33 

2.33 

2.33 

2.33 

2.33 

-X-  y is  normally  distributed  about  h with  variance  s^  are 

independent  estimates  of  and  based  on  f-j  and  f2  degrees  of  freedom^  respec- 
tively. and  Xg  are  ImoX'jn  constants. 

In  the  problem  of  comparing  the  means  of  samples  tal:en  from  two  normal  populations, 
put  y =(xp  - X2),  ^2"^  (^2-1)  5 ^2.  ■ X2  = l/n2,  where  and 

are  the  sample  sizes. 


Table  3.  Value  of  v = (y^-  exceeded  vdth  orobability  -G  = 0.0^-«- 


[or  of  / 

V / ex 

ceeded  with  probability  2 

e = 

0.10] 

2 2 
X1S1+X2S2 

0.0 

: 0-1 

; 0.2 

: 0.3 

: O.U 

: 0.5 

0.6 

0.7 

! 0.8 

0.9 

: 1.0 

f2=  6 

f3_=  6 

1.9h 

1.90 

1.85 

1.80 

1.76 

l.lh 

1.76 

1.80 

1.85 

1.90 

1.9i; 

8 

1.9k 

1.90 

1.85 

1.80 

1.76 

1.73 

1.71; 

1.76 

1.79 

1.82 

1.86 

10 

1.9h 

1.90 

1.85 

1.80 

1.76 

1.73 

1.73 

1.71; 

1.76 

1.78 

1.81 

15 

1.9U 

1.90 

1.85 

1.80 

1.76 

1.73 

1.71 

1.71 

1.72 

1.73 

1.75 

20 

1.9U 

1.90 

1.85 

1.80 

1.76 

1.73 

1.71 

1.70 

1.70 

1.71 

1.72 

00 

I.9U 

1.90 

1.85 

1.80 

1.76 

1.72 

1.69 

1.67 

1.66 

1.65 

1.61; 

f2=  8 

fl=  6 

1.86 

1.82 

1.79 

1.76 

l.7ii 

1.73 

1.76 

1.80 

1.85 

1.90 

1.9k 

8 

1.86 

1.82 

1.79 

1.76 

1.73 

1.73 

1.73 

1.76 

1.79 

1.82 

1.86 

10 

1,86 

1.82 

1.79 

1.76 

1,73 

1.72 

1.72 

1.71; 

1.76 

1.78 

1.81 

15 

1.86 

1.82 

1.79 

1.76 

1.73 

1.71 

1.71 

1.71 

1.72 

1.73 

1.75 

20 

1.86 

1.82 

1.79 

1.76 

1.73 

1.71 

1.70 

1.70 

1.70 

1.71 

1.72 

00 

1.06 

1.82 

1.79 

1.75 

1.72 

1.70 

1.68 

1.66 

1,65 

1.65 

1.61; 

f2=10 

6 

1.81 

1.78 

1.76 

1.7h 

1.73 

1.73 

1.76 

l.OO 

1.85 

1.90 

I.9U 

8 

1.01 

1.78 

1.76 

1.7i; 

1.72 

1.72 

1.73 

1.76 

1.79 

1.82 

1.86 

10 

1.81 

1.78 

1.76 

1.73 

1.72 

1.71 

1.72 

1.73 

1.76 

1.78 

1.81 

15 

1.81 

1.78 

1.76 

1.73 

1.72 

1.70 

1.70 

1.71 

1.72 

1.73 

1.75 

20 

1.81 

1.78 

1.76 

1.73 

1.71 

1.70 

1.69 

1.69 

1.70 

1.71 

1.72 

00 

1.81 

1.78 

1.76 

1.73 

1.71 

1.69 

1.67 

1.66 

1.65 

1.65 

1.61; 

f2=l? 

fl=  6 

1.75 

1.73 

1.72 

1.71 

1.71 

1.73 

1.76 

1.80 

1.85 

1.90 

1.91; 

8 

1.75 

1.73 

1.72 

1.71 

1.71 

1.71 

1.73 

1.76 

1.79 

1.82 

1.86 

10 

1.75 

1.73 

1.72 

1,71 

1.70 

1.70 

1.72 

1.73 

1.76 

1.78 

1.81 

15 

1.75 

1.73 

1.72 

1.70 

1.70 

1.69 

1.70 

1.70 

1.72 

1.73 

1.75 

20 

1.75 

1.73 

1.72 

1.70 

1.69 

1.69 

1.69 

1.69 

1.70 

1.71 

1.72 

00 

1.75 

1.73 

1.72 

1.70 

1.68 

1.67 

1.66 

1.65 

1.65 

1.65 

1.61; 

f2=20 

fp=  6 

1.72 

1.71 

1.70 

1.70 

1.71 

1.73 

1.76 

1.80 

1.85 

1,90 

I.9I; 

^ 8 

1.72 

1.71 

1.70 

1.70 

1.70 

1.71 

1.73 

1.76 

1.79 

1.82 

1.86 

10 

1.72 

1.71 

1.70 

1.69 

1.69 

1.70 

1.71 

1.73 

1.76 

1.78 

1.81 

15 

1.72 

1.71 

1.70 

1.69 

1.69 

1.69 

1.69 

1.70 

1.72 

1.73 

1.75 

20 

1.72 

1.71 

1.70 

1.69 

1.68 

1.68 

1,68 

1.69 

1.70 

1.71 

1.72 

00 

1.72 

1.71 

1.70 

1.68 

1,67 

1.66 

1,66 

1.65 

1.65 

1,65 

1.61; 

l2=  CD 

fl=  6 

1.6U 

1.65 

1.66 

1.67 

1.69 

1.72 

1.76 

1.80 

1.85 

1.90 

I.9I; 

8 

1.6U 

1.65 

1.65 

1,66 

. 1.68 

1.70 

1.72 

1.75 

1.79 

1.82 

1.86 

10 

1.6U 

1.65 

1.65 

1,66 

1.67 

1.69 

1.71 

1.73 

1.76 

1.78 

1.81 

15 

1.6U 

1.65 

1.65 

1.65 

1.66 

1.67 

1.68 

1.70 

1.72 

1.73 

1.75 

20 

1.6]4. 

1.65 

1.65 

1.65 

1,66 

1.66 

1.67 

1.68 

1.70 

1.71 

1.72 

00 

1.6U 

1.6ii 

1.6U 

1.6U 

1.61). 

1.6h 

1,6k 

0 

1.61; 

p. 

1.6U 

1.6i; 

9 

1.61; 

9 

independent  estimates  of  and  based  on  f^  and  f2  degrees  of  freedom,  respec- 
tively . X], 


and  ^2  known  constants. 


In  the  problem  of  comparing  the  means  of  sam.nles  taken  from  two  normal  popula- 


tions, put  y = (x^  - X2),  fp=  (n^-l) 
and  n2  are  the  sample  sizes. 


fp=  (np-1) 


= l/np  and  X2 


= l/n2s  where  n^ 


Table  U.  Combarison  of  butterfat  tests  for  a selected  producer 
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1 
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0 
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0 

P 

P 

eg 

Pi 

0 

P 

P 

P 

P 
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0 
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G 
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0 
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0 

rH  Pi 

•H 
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cd 

0 1 p P 

p.- 
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c 0 0 

0 
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• 
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0 
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p 
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Exsjmole  1 


The  data  in  Table  1 relate  to  an  exneriraent  in  which  three  treatraents  are 
being  compared^  a single  characteristic  x being  measured.  As  mentioned  in 
•oaragraph  1 above,  the  are  noxr  the  mea,ns  x-(^  for  the  three  treatments.  The 
s^  are  the  individual  'xd. thin  treatment'  variances,  estimated  on  degrees  of 
freedom  f-(^,  equal  to  one  less  than  the  nmber  of  renlicates  in  each  case. 

Also  = l/n^.  Hence,  = n^/s^. 


Table  1 


Treatment : 
(t)  : 

Nuinber  of 
individuals 

(n^) 

; Treatment: 
; mean  : 
: (xt)  : 

Observed  : 
variance  : 

(4) 

Estimated  : 
variance  of  : 

mean  : 

(s^/n^)  : 

Wt 

1 

20 

27.3 

60.1 

3.00 

0.333 

2 

10 

2U.1 

6.3 

0.63 

1.587 

3 

10 

22.2 

15. U 

1.5U 

0.6U9 

2.569 


If  the  true  means  are  all  equal,  the  best  estimate  of  the  common  u will 
be  given  by  z = (Xw^x^)/(^w^)  = 2U.10.  The  details  required  in  this  calcula- 
tion and  fourth ""r  quantities  needed  later  are  shoxm  in  Table  2.  TAh  arbitrary 
origin,  x = 25,  is  being  used. 

Table  2 


t 

w^.{xt-2S) 

w^(x^-25)2 

(i-w^(rw^)^ 

1 

0.932 

2.612 

0.130 

0.757 

0.0398 

2 

- 1.U28 

1.2S5 

0.618 

0.1U6 

0.0162 

3 

- 1,817 

5.088 

0.253 

0.558 

0,0620 

- 2.313 

8.935 

0.1180 

We  have  ^s^(x^-x)^  = 6. 985- (2. 313)^2. 569 

^ = 6.90  (31) 

Substitution  into  (29)  then  gives 


= 1/2(6.90)  3^ - 3.35  (32) 

[1+1/1/(0.1180)  ] 1.029 

Tnis  m.ust  be  referred  to  the  variance- ratio  table  entered  with  f^  = 2,  and 

^2  = [3/8(0.1180)-]-!  = 8/(0. 35Ii)  = 22.6  (33) 

The  5 percent  noint  for  F,  corresjDoncling  to  these  numbers  of  degrees  of  freedom 
is  3.[i3.  If,  therefore,  this  noint  is  taken  as  critical,  the  exneriraent  just 
fails  to  demonstrate  that  the  true  means  differ. 
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